Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
نویسندگان
چکیده
The integration of multiword expressions in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly pre-identified. This paper evaluates two empirical strategies to integrate multiword units in a real constituency parsing context and shows that the results are not as promising as has sometimes been suggested. Firstly, we show that pregrouping multiword expressions before parsing with a state-of-the-art recognizer improves multiword recognition accuracy and unlabeled attachment score. However, it has no statistically significant impact in terms of F-score as incorrect multiword expression recognition has important side effects on parsing. Secondly, integrating multiword expressions in the parser grammar followed by a reranker specific to such expressions slightly improves all evaluation metrics.
منابع مشابه
Strategies for Contiguous Multiword Expression Analysis and Dependency Parsing
In this paper, we investigate various strategies to predict both syntactic dependency parsing and contiguous multiword expression (MWE) recognition, testing them on the dependency version of French Treebank (Abeillé and Barrier, 2004), as instantiated in the SPMRL Shared Task (Seddah et al., 2013). Our work focuses on using an alternative representation of syntactically regular MWEs, which capt...
متن کاملAccounting for Contiguous Multiword Expressions in Shallow Parsing
In this paper, we focus on chunking including contiguous multiword expression recognition, namely super-chunking. In particular, we present different strategies to improve a superchunker based on Conditional Random Fields by combining it with a finite-state symbolic super-chunker driven by lexical and grammatical resources. We display a substantial gain of 7.6 points in terms of overall accuracy.
متن کاملStratégies discriminantes pour intégrer la reconnaissance des mots composés dans un analyseur syntaxique en constituants
We propose two discriminative strategies to integrate compound word recognition in a parsing context: (i) compound pregrouping with Conditional Random Fields before parsing, (ii) reranking parses with a maximum entropy model after parsing. These discriminative models integrate features dedicated to compounds, some of theme being computed from external lexical resources. We show that the pregrou...
متن کاملParsing Models for Identifying Multiword Expressions
Multiword expressions lie at the syntax/semantics interface and have motivated alternative theories of syntax like Construction Grammar. Until now, however, syntactic analysis and multiword expression identification have been modeled separately in natural language processing. We develop two structured prediction models for joint parsing and multiword expression identification. The first is base...
متن کاملDeep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework
We explore the consequences of representing token segmentations as hierarchical structures (trees) for the task of Multiword Expression (MWE) recognition, in isolation or in combination with dependency parsing. We propose a novel representation of token segmentation as trees on tokens, resembling dependency trees. Given this new representation, we present and evaluate two different architecture...
متن کامل